Environments in which any kind of learning could be readily discerned were Gridworld and one of the Mountain car domains (MountainCar1).

Although the agent rarely, if at all, managed to find its way to the goal in Gridworld, it did learn that it could end the episode quickly, and thus save on living cost, by approaching the green adversary as directly as possible and thus get eaten. As the adversary started in the goal zone, the agent then learned to move in that direction. In the version where the adversary could walk through walls, the agent quickly learned to move straight to the right and meet halfway.

In MountainCar1, the algorithm performed similarly as in Gridworld, albeit at a different time scale. Typically, a few hundred time steps were enough to reach the goal.

Progress was not so good in the other Mountain car environment (MountainCar2), where if the agent made it to the goal and managed to learn something, it would try to apply the same actions to reach the goal again in the next episode already from the beginning. Usually, however, it would fail, and instead spend tens of thousands of steps trying slight variations of that tactic until it had been unlearnt. We believe this might be a side effect from normalisation of the radial basis function network, or due to over-generalisation. The latter could be alleviated by fine-tuning the parameters, but we did not manage to find a good combination for this task.

\figurename~\ref{fig:gridWorld} and \figurename~\ref{fig:mountainCar1} show the number of steps per episode and the total reward per episode for $100$ episodes of Gridworld and MountainCar1, respectively, while
\figurename~\ref{fig:mountainCar2} shows the same for $3$ episodes of MountainCar2.

\begin{figure}[!t]
    \centering
    \includegraphics[width=\linewidth]{figures/gridWorld2RewardSteps}
    \caption{Results for Gridworld.}
    \label{fig:gridWorld}
\end{figure}
\begin{figure}[!t]
    \centering
    \includegraphics[width=\linewidth]{figures/montainCarRewardSteps}
    \caption{Results for MountainCar1.}
    \label{fig:mountainCar1}
\end{figure}

\begin{figure}[!t]
    \centering
    \includegraphics[width=\linewidth]{figures/montainCarEnv2RewardSteps}
    \caption{Results for MountainCar2.}
    \label{fig:mountainCar2}
\end{figure}

